proprietary data
- North America > Dominican Republic (0.04)
- Europe > Italy > Piedmont > Turin Province > Turin (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (3 more...)
- Research Report (0.67)
- Workflow (0.46)
- Health & Medicine (0.47)
- Information Technology (0.46)
QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Data
Table Representation Learning (TRL) models are commonly pre-trained on large open-domain datasets comprising millions of tables and then used to address downstream tasks. Choosing the right TRL model to use on proprietary data can be challenging, as the best results depend on the content domain, schema, and data quality. Our purpose is to support end-users in testing TRL models on proprietary data in two established SQL-centric tasks, i.e., Question Answering (QA) and Semantic Parsing (SP). We present QATCH (Query-Aided TRL Checklist), a toolbox to highlight TRL models' strengths and weaknesses on relational tables unseen at training time. For an input table, QATCH automatically generates a testing checklist tailored to QA and SP. Checklist generation is driven by a SQL query engine that crafts tests of different complexity. This design facilitates inherent portability, allowing the checks to be used by alternative models. We also introduce a set of cross-task performance metrics evaluating the TRL model's performance over its output. Finally, we show how QATCH automatically generates tests for proprietary datasets to evaluate various state-of-the-art models including TAPAS, TAPEX, and CHATGPT.
- North America > Dominican Republic (0.04)
- Europe > Italy > Piedmont > Turin Province > Turin (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (3 more...)
- Research Report (0.67)
- Workflow (0.46)
- Health & Medicine (0.47)
- Information Technology (0.46)
QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Data
Table Representation Learning (TRL) models are commonly pre-trained on large open-domain datasets comprising millions of tables and then used to address downstream tasks. Choosing the right TRL model to use on proprietary data can be challenging, as the best results depend on the content domain, schema, and data quality. Our purpose is to support end-users in testing TRL models on proprietary data in two established SQL-centric tasks, i.e., Question Answering (QA) and Semantic Parsing (SP). We present QATCH (Query-Aided TRL Checklist), a toolbox to highlight TRL models' strengths and weaknesses on relational tables unseen at training time. For an input table, QATCH automatically generates a testing checklist tailored to QA and SP.
Using a Large Language Model to generate a Design Structure Matrix
DSM is known for its simplicity and conciseness in representation and exists in the form of a square matrix that maps the relationships between the set of system elements [Yassine and Braha 2003; Browning 2015]. An example DSM (= 4) is shown in Figure 1. Based on the DSM convention described by Browning [2001], Element 1 depends on Element 2 as indicated by a red cell entry in row 2 column 1 of the DSM. Likewise, Element 4 depends on Element 3 as indicated in row 3 column 4. The diagonal of the DSM maps each element to itself and is indicated as black cells in Figure 1. The diagonal is usually left empty but is sometimes used as a space to store element-specific data, such as the likelihood of changing the given element based on market projection [Koh et al. 2013]. The DSM in Figure 1 is not symmetrical across the diagonal, indicating asymmetrical dependencies between the system elements. For example, Element 1 depends on Element 2 but Element 2 does not depend on Element 1. In contrast, the example DSM shows that Element 2 and Element 4 have a symmetrical interdependency. It is important to note that a transposed version of the DSM convention is also widely adopted by many (e.g.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Netherlands > Utrecht (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
Protecting Publicly Available Data With Machine Learning Shortcuts
Müller, Nicolas M., Burgert, Maximilian, Debus, Pascal, Williams, Jennifer, Sperl, Philip, Böttinger, Konstantin
Machine-learning (ML) shortcuts or spurious correlations are artifacts in datasets that lead to very good training and test performance but severely limit the model's generalization capability. Such shortcuts are insidious because they go unnoticed due to good in-domain test performance. In this paper, we explore the influence of different shortcuts and show that even simple shortcuts are difficult to detect by explainable AI methods. We then exploit this fact and design an approach to defend online databases against crawlers: providers such as dating platforms, clothing manufacturers, or used car dealers have to deal with a professionalized crawling industry that grabs and resells data points on a large scale. We show that a deterrent can be created by deliberately adding ML shortcuts. Such augmented datasets are then unusable for ML use cases, which deters crawlers and the unauthorized use of data from the internet. Using real-world data from three use cases, we show that the proposed approach renders such collected data unusable, while the shortcut is at the same time difficult to notice in human perception. Thus, our proposed approach can serve as a proactive protection against illegitimate data crawling.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > United Kingdom > England > Hampshire > Southampton (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (0.71)
Michael Jayawardana on LinkedIn: #bloomberggpt #artificialintelligence
Bloomberg's announcement that it created a ChatGPT-like large language model focused on finance created a bit of a stir. "BloombergGPT AI may be the harbinger of the next wave of corporate AI," Ethan Mollick, a professor at Wharton, tweeted. He noted that building models is all about the training data and Bloomberg enjoyed the advantage of including proprietary data about finance as well as general information scraped from the Web. Reading the Bloomberg research paper provides some insight into the strange terrain where we find ourselves. Among other things, Bloomberg used a data set called "Enron Emails."
NVIDIA Unveils Large Language Models and Generative AI Service to Advance Life Sciences R&D
GTC--NVIDIA today announced an expanded set of generative AI cloud services for customizing AI foundation models to accelerate the creation of new proteins and therapeutics, as well as research in the fields of genomics, chemistry, biology and molecular dynamics. Part of NVIDIA AI Foundations, the new BioNeMo Cloud service offering -- for both AI model training and inference -- accelerates the most time-consuming and costly stages of drug discovery. It enables researchers to fine-tune generative AI applications on their own proprietary data, and to run AI model inference directly in a web browser or through new cloud application programming interfaces (APIs) that easily integrate into existing applications. "The transformative power of generative AI holds enormous promise for the life science and pharmaceutical industries," said Kimberly Powell, vice president of healthcare at NVIDIA. "NVIDIA's long collaboration with pioneers in the field has led to the development of BioNeMo Cloud Service, which is already serving as an AI drug discovery laboratory. It provides pretrained models and allows customization of models with proprietary data that serve every stage of the drug-discovery pipeline, helping researchers identify the right target, design molecules and proteins, and predict their interactions in the body to develop the best drug candidate."
NVIDIA unveils AI Foundations, its customizable Gen-AI cloud service
The age of enterprise AI has come crashing down upon us in recent months. Public infatuation with ChatGPT since its release last November has opened the floodgates of corporate interest and set off an industry-wide land grab with every major tech entity vying to stake their claim in this burgeoning market by incorporating generative AI features into their existing products. Heavyweights including Google, Microsoft, Meta, and Baidu are already jockeying their Large Language Models (LLMs) for market dominance, while everybody else, from Adobe and AT&T to BMW and BYD, scrambles to find uses for the revolutionary technology. NVIDIA's newest cloud services offering, AI Foundations, will allow businesses lacking the time and money to develop their own models from scratch to "to build, refine and operate custom large language models and generative AI models that are trained with their own proprietary data and created for their unique domain-specific tasks." These models include NeMo, NVIDIA's text-to-image generation engine and DALL-E 2 competitor; BioNemo, a drug and molecule discovery-focused fork of the NeMo model built for the medical research community; and Picasso, an AI capable of generating images, video and "3D applications… to supercharge productivity for creativity, design and digital simulation," according to Tuesday's release.
- Health & Medicine (1.00)
- Information Technology > Hardware (0.91)
- Information Technology > Services (0.88)
Who Will Make Money from the Generative AI Gold Rush? Part I
BigTech companies already dominate in GenAI infrastructure with their cloud services and hardware chips. Microsoft and Google are well-positioned in the US cloud market, while Baidu and Alibaba are well-positioned in China. Their massive supercomputer cloud infrastructure is engineered to run GenAI's complex, expensive, large text, visual, and audio Foundational Models. There are already many developers using their cloud AI API services and tools to build apps, and this trend is expected to accelerate as entrepreneurs rush to address virtually limitless GenAI use cases. Amazon has been quiet on Foundational Models, so a big question is how will they respond. GenAI uses massive amounts of computational power to generate creative outputs.
- Asia > China (0.26)
- North America > United States (0.25)